---
id: quick-start
title: Quick start
---

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

This short tutorial will help you start scraping with Crawlee in just a minute or two. For an in-depth understanding of how Crawlee works, check out the [Introduction](../introduction/index.mdx) section, which provides a comprehensive step-by-step guide to creating your first scraper.

## Choose your crawler

Crawlee offers two main crawler classes: `BeautifulSoupCrawler`, and `PlaywrightCrawler`. All crawlers share the same interface, providing maximum flexibility when switching between them.

### BeautifulSoupCrawler

The `BeautifulSoupCrawler` is a plain HTTP crawler that parses HTML using the well-known [BeautifulSoup](https://pypi.org/project/beautifulsoup4/) library. It crawls the web using an HTTP client that mimics a browser. This crawler is very fast and efficient but cannot handle JavaScript rendering.

### PlaywrightCrawler

The `PlaywrightCrawler` uses a headless browser controlled by the [Playwright](https://playwright.dev/) library. It can manage Chromium, Firefox, Webkit, and other browsers. Playwright is the successor to the [Puppeteer](https://pptr.dev/) library and is becoming the de facto standard in headless browser automation. If you need a headless browser, choose Playwright.

:::caution before you start

Crawlee requires Python 3.9 or later.

:::

## Installation

Crawlee is available as the [`crawlee`](https://pypi.org/project/crawlee/) PyPI package.

```sh
pip install crawlee
```

Additional, optional dependencies unlocking more features are shipped as package extras.

If you plan to use `BeautifulSoupCrawler`, install `crawlee` with `beautifulsoup` extra:

```sh
pip install 'crawlee[beautifulsoup]'
```

If you plan to use `PlaywrightCrawler`, install `crawlee` with the `playwright` extra:

```sh
pip install 'crawlee[playwright]'
```

Then, install the Playwright dependencies:

```sh
playwright install
```

You can install multiple extras at once by using a comma as a separator:

```sh
pip install 'crawlee[beautifulsoup,playwright]'
```

## Crawling

Run the following example to perform a recursive crawl of the Crawlee website using the selected crawler.

<Tabs groupId="main">
<TabItem value="BeautifulSoupCrawler" label="BeautifulSoupCrawler">

```python
import asyncio

from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext


async def main() -> None:
    # BeautifulSoupCrawler crawls the web using HTTP requests and parses HTML using the BeautifulSoup library.
    crawler = BeautifulSoupCrawler(max_requests_per_crawl=50)

    # Define a request handler to process each crawled page and attach it to the crawler using a decorator.
    @crawler.router.default_handler
    async def request_handler(context: BeautifulSoupCrawlingContext) -> None:
        # Extract relevant data from the page context.
        data = {
            'url': context.request.url,
            'title': context.soup.title.string if context.soup.title else None,
        }
        # Store the extracted data.
        await context.push_data(data)
        # Extract links from the current page and add them to the crawling queue.
        await context.enqueue_links()

    # Add first URL to the queue and start the crawl.
    await crawler.run(['https://crawlee.dev'])


if __name__ == '__main__':
    asyncio.run(main())
```

</TabItem>
<TabItem value="PlaywrightCrawler" label="PlaywrightCrawler">

```python
import asyncio

from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext


async def main() -> None:
    # PlaywrightCrawler crawls the web using a headless browser controlled by the Playwright library.
    crawler = PlaywrightCrawler()

    # Define a request handler to process each crawled page and attach it to the crawler using a decorator.
    @crawler.router.default_handler
    async def request_handler(context: PlaywrightCrawlingContext) -> None:
        # Extract relevant data from the page context.
        data = {
            'url': context.request.url,
            'title': await context.page.title(),
        }
        # Store the extracted data.
        await context.push_data(data)
        # Extract links from the current page and add them to the crawling queue.
        await context.enqueue_links()

    # Add first URL to the queue and start the crawl.
    await crawler.run(['https://crawlee.dev'])


if __name__ == '__main__':
    asyncio.run(main())
```

</TabItem>
</Tabs>

When you run the example, you will see Crawlee automating the data extraction process in your terminal.

{/* TODO: improve the logging and add here a sample */}

## Running headful browser

By default, browsers controlled by Playwright run in headless mode (without a visible window). However, you can configure the crawler to run in a headful mode, which is useful during development phase to observe the browser's actions. You can alsoswitch from the default Chromium browser to Firefox or WebKit.

```python
# ...

async def main() -> None:
    crawler = PlaywrightCrawler(
        # Run with a visible browser window.
        headless=False,
        # Switch to the Firefox browser.
        browser_type='firefox'
    )

    # ...
```

When you run the example code, you'll see an automated browser navigating through the Crawlee website.

{/* TODO: add video example */}

## Results

By default, Crawlee stores data in the `./storage` directory within your current working directory. The results of your crawl will be saved as JSON files under `./storage/datasets/default/`.

To view the results, you can use the `cat` command:

```sh
cat ./storage/datasets/default/000000001.json
```

The JSON file will contain data similar to the following:

```json
{
    "url": "https://crawlee.dev/",
    "title": "Crawlee · Build reliable crawlers. Fast. | Crawlee"
}
```

:::tip

If you want to change the storage directory, you can set the `CRAWLEE_STORAGE_DIR` environment variable to your preferred path.

:::

## Examples and further reading

For more examples showcasing various features of Crawlee, visit the [Examples](/docs/examples) section of the documentation. To get a deeper understanding of Crawlee and its components, read the step-by-step [Introduction](../introduction/index.mdx) guide.

[//]: # (TODO: add related links once they are ready)
